23 research outputs found

    Agents, Bookmarks and Clicks: A topical model of Web traffic

    Full text link
    Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in Proceedings of the 21th ACM conference on Hypertext and Hypermedi

    Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams

    Full text link
    Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations

    Long-term real-world experience with ipilimumab and non-ipilimumab therapies in advanced melanoma: the IMAGE study.

    Get PDF
    Funder: This work was supported by Bristol Myers Squibb (no grant number is applicable).BACKGROUND: Ipilimumab has shown long-term overall survival (OS) in patients with advanced melanoma in clinical trials, but robust real-world evidence is lacking. We present long-term outcomes from the IMAGE study (NCT01511913) in patients receiving ipilimumab and/or non-ipilimumab (any approved treatment other than ipilimumab) systemic therapies. METHODS: IMAGE was a multinational, prospective, observational study assessing adult patients with advanced melanoma treated with ipilimumab or non-ipilimumab systemic therapies between June 2012 and March 2015 with ≥3 years of follow-up. Adjusted OS curves based on multivariate Cox regression models included covariate effects. Safety and patient-reported outcomes were assessed. RESULTS: Among 1356 patients, 1094 (81%) received ipilimumab and 262 (19%) received non-ipilimumab index therapy (systemic therapy [chemotherapy, anti-programmed death 1 antibodies, or BRAF ± MEK inhibitors], radiotherapy, and radiosurgery). In the overall population, median age was 64 years, 60% were male, 78% were from Europe, and 78% had received previous treatment for advanced melanoma. In the ipilimumab-treated cohort, 780 (71%) patients did not receive subsequent therapy (IPI-noOther) and 314 (29%) received subsequent non-ipilimumab therapy (IPI-Other) on study. In the non-ipilimumab-treated cohort, 205 (78%) patients remained on or received other subsequent non-ipilimumab therapy (Other-Other) and 57 (22%) received subsequent ipilimumab therapy (Other-IPI) on study. Among 1151 patients who received ipilimumab at any time during the study (IPI-noOther, IPI-Other, and Other-IPI), 296 (26%) reported CTCAE grade ≥ 3 treatment-related adverse events, most occurring in year 1. Ipilimumab-treated and non-ipilimumab-treated patients who switched therapy (IPI-Other and Other-IPI) had longer OS than those who did not switch (IPI-noOther and Other-Other). Patients with prior therapy who did not switch therapy (IPI-noOther and Other-Other) showed similar OS. In treatment-naive patients, those in the IPI-noOther group tended to have longer OS than those in the Other-Other group. Patient-reported outcomes were similar between treatment cohorts. CONCLUSIONS: With long-term follow-up (≥ 3 years), safety and OS in this real-world population of patients treated with ipilimumab 3 mg/kg were consistent with those reported in clinical trials. Patient-reported quality of life was maintained over the study period. OS analysis across both pretreated and treatment-naive patients suggested a beneficial role of ipilimumab early in treatment. TRIAL REGISTRATION: ClinicalTrials.gov , NCT01511913. Registered January 19, 2012 - Retrospectively registered, https://clinicaltrials.gov/ct2/show/NCT01511913

    A framework for analysis of anonymized network flow data

    No full text
    Many projects analyze application overlay networks on the Internet using packet analysis and network flow data. This is infeasible on many networks: either the volume of data makes packet inspection intractable, or privacy concerns forbid packet capture and require the dissociation of network flows from users ’ identities. We describe a framework for exploration of usage patterns even under circumstances where the only available data is anonymized flow records. We offer two proofs of concept using data gathered from Internet2. In the first, we uncover distributions and scaling relations in host-to-host networks with implications for capacity planning and application design. In the second, we classify network applications based on properties of their overlay networks, yielding a taxonomy that allows us to identify the functions of unknown applications.

    Visual comparison of search results: A censorship case study

    No full text
    Understanding the qualitative differences between the sets of results from different search engines can be a difficult task. How many links must you follow from each list before you can reach a conclusion? We describe a user interface that allows users to quickly identify the most significant differences in content between two lists of Web pages. We have implemented this interface in CenSEARCHip, a system for comparing the effects of censorship policies on search engines

    On the Lack of Typical Behavior in the Global Web Traffic Network

    No full text
    We offer the first large-scale analysis of Web traffic based on network flow data. Using data collected on the Internet2 network, we constructed a weighted bipartite clientserver host graph containing more than 18 × 10^6 vertices and 68 × 10^6 edges valued by relative traffic flows. When considered as a traffic map of the World-Wide Web, the generated graph provides valuable information on the statistical patterns that characterize the global information flow on the Web. Statistical analysis shows that client-server connections and traffic flows exhibit heavy-tailed probability distributions lacking any typical scale. In particular, the absence of an intrinsic average in some of the distributions implies the absence of a prototypical scale appropriate for server design, Web-centric network design, or traffic modeling. The inspection of the amount of traffic handled by clients and servers and their number of connections highlights non-trivial correlations between information flow and patterns of connectivity as well as the presence of anomalous statistical patterns related to the behavior of users on the Web. The results presented here may impact considerably the modeling, scalability analysis, and behavioral study of Web applications
    corecore